The Treatment of Compounds in a Morphological Component for Speech Recognition
نویسندگان
چکیده
This paper describes a morphological component in a speech recognition system for German dealing with the construction of complex word form hypotheses out of a lattice of simplex forms. Our example is the recognition of compounds from their individual components. Evaluation results are presented for speech recognition with and without morphologically based word recognition. Dieser Aufsatz beschreibt eine Morphologiekomponente in einem Spracherkennungssystem f ur das Deutsche, welche die Konstruktion von komplexen Worthypothesen aus einem Wortergitter von Simplizia am Beispiel der Erkennung von Komposita aus ihren Einzelbestandteilen behandelt. Evaluationsergebnisse f ur morphologisch und nicht{morphologisch basierte Worterkennung werden vorgestellt. 1 Goals and motivation This paper1 proposes a strategy for partially satisfying the growing demands on speech recognition systems, e.g. large vocabulary recognition, few domain restrictions, robustness, and unknown word recognition by integrating morphological knowledge into the speech recognition process. Current stochastic word recognizers have, for example, certain di culties with compound word forms. Compounds can be de ned as words which are built compositionally from other words or stems of words that can occur as free forms. Examples of German compounds are Arzttermin (constituents: Arzt, Termin), Arbeitsamt (constituents: Arbeit, Amt), Wochenendtermin (constituents: Woche, Ende, Termin). Compounding is a frequent phenomenon in spontaneous speech: In the current VERBMOBIL transliteration corpus of 172672 wordform tokens and the related lexical database of 4514 wordform types, the token frequency of compounds is 11%, the type frequency amounts to 36%. Both compounds and their individual constituents were included in the recognition dictionary, and most of the compounds as well as their individual constituents (but in almost all their possible in ected forms) occurred in the output lattice of the stochastic word recognition system (cf. H ubener et al., 1996). A dictionary of this kind is highly redundant; large dictionaries reduce the speed of the stochastic word recognition, and in view of the in nite number of potential out{of{vocabulary compounds, an exhaustive lexical listing is simply not feasible. For the task of recognizing out{of{vocabulary words, the employment of phonotactic constraints on well{formed syllable structures has already been tested, see e.g. Jusek et al. (1994). Since complex words consist of units which are members of a nite set of morphs, it is also possible to specify morphotactic rules which operate on this nite morph lexicon to derive complex word forms. It is obvious that the set of actual morphs (those which are lexicalized in a morph lexicon) is only a subset of the set of potential morphs (those which satisfy the phonotactic constraints). Thus an integration of morphological knowledge leads to more speci c constraints on out{of vocabulary complex word forms. Occurrences of discontinuous (`split') word forms are a further problem in recognizing spontaneous speech. These often cannot be detected by speech recognition systems because their phonological material is torn apart by slips of the tongue, repetitions, pauses or other insertions. An analysis of split word forms in our corpus demonstrated that most are compounds split at morphological boundaries. Although split compounds are not easily recognized by stochastic 1This paper was originally published in Dafydd Gibbon (ed.): Natural Language Processing and Speech Technology. Results of the 3rd KONVENS Conference. Bielefeld, October 1996, pp. 71{76. Berlin, etc.: Mouton de Gruyter. The Treatment of Compounds in a Morphological Component for Speech Recognition SYNTAX Word Graph WORD RECOGNITION Signal Hypothesis Figure 1: Standard incremental speech recognition architecture. MORPHOLOGY SYNTAX Word Graph Word Graph Signal Hypothesis Hypothesis RECOGNITION WORD Figure 2: Standard incremental speech recognition architecture with morphology. word recognition systems, their constituents are, and they can be recombined using morphological and phonological knowledge (cf. L ungen et al., 1996). Thus, our morphological component is designed to achieve the following goals: 1. To reduce of the size of the word recognizer dictionary through the recognition of lexicalized compounds from their individual constituents, 2. To prepare the ground for robust morphologically based recognition of out{ of{vocabulary words. 2 Speech recognition with online morphology: Architecture and interfaces In order to explore the use of morphological decomposition in the speech recognition process, two di erent architectures were tested. Figure 1 shows the speech recognition architecture without morphology, and Figure 2 presents the integration of our morphological component.2 The interfaces of the online morphology, word hypothesis graphs (WHGs), correspond exactly to the existing interface between the stochastic word recognition component and the syntactic component, no interface speci cations of associated components had to be changed. Operations on the WHG in the morphological 2The experimental communication model for speech recognition used in the VERBMOBIL subproject 15 is INTARC, cf. Amtrup (1995). 3 The Treatment of Compounds in a Morphological Component for Speech Recognition component add new information by inserting new word hypotheses containing new compounds and con dence values, and the resulting WHG is transmitted to the higher components. 3 Reducing the recognition dictionary Geutner (1995) mentions a degrading of the acoustic part of her word recognizer when using morph dictionaries containing a xes; this was predictable, however, since these are phonetically very small and often unstressed units. Though our morphological model (see Section 4) allows for the additional treatment of in ection and derivation, we have initially restricted our attention to morphological composition since word{sized linguistic units are involved. The test vocabulary covered by the speech recognition system, a small wordlist of 470 wordforms, was reduced by 20% by splitting the 142 compounds into their constituents. This resulted in a list of 389 simplex words as potential compound constituents. This reduction rate will increase with increasing corpus size. The word recognition component was subsequently trained on the reduced dictionary. 4 Modelling Finite{state automata have been established as adequate and e cient models for describing phenomena in the area of morphology (cf. Kay, 1987; Kaplan and Kay, 1994; Sproat, 1992). Our compositional morphotactics is encoded in a Finite State Network (FSN). Since a WHG is, in e ect, an FSN, the task of a morphological lattice parser is simply to nd an intersection between two FSNs (cf. Kaplan and Kay, 1994). In the current network designed for the construction of compounds from their individual constituents, the arcs of the network are labelled with the stem forms of the compound constituents. We thus generalize over all possible in ected forms of one stem including those found in compounds (i.e. Modi er+Inter x ). The employment of an independent lexical knowledge source permits the requirement of strict string identity of a path label in the compositional morphology network with a path label in a WHG to be relaxed. Compare the following WHG extract containing in ected forms 0 1 Termine 1.0 300 500 1 4 Kalenders 1.0 80
منابع مشابه
روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملCorrelation between Auditory Spectral Resolution and Speech Perception in Children with Cochlear Implants
Background: Variability in speech performance is a major concern for children with cochlear implants (CIs). Spectral resolution is an important acoustic component in speech perception. Considerable variability and limitations of spectral resolution in children with CIs may lead to individual differences in speech performance. The aim of this study was to assess the correlation between auditory ...
متن کاملسیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی
Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کامل